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Abstract 

We present an adaptation of the standard Grassberger-Proccacia (GP) algorithm for 
estimating the Correlation Dimension of a time series in a non subjective manner. 
The validity and accuracy of this approach is tested using different types of time 
series, such as, those from standard chaotic systems, pure white and colored noise 
and chaotic systems added with noise. The effectiveness of the scheme in analysing 
noisy time series, particularly those involving colored noise, is investigated. An 
interesting result we have obtained is that, for the same percentage of noise addition, 
data with colored noise is more distinguishable from the corresponding surrogates, 
than data with white noise. As examples for real life applications, analysis of data 
from an astrophysical X-ray object and human brain EEG, are presented. 
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Introduction 



Most of the complex phenomena observed in nature arise from the nonlinear 
nature of the underlying dynamics. Hence techniques from nonlinear dynam- 
ics and chaos theory are increasingly being applied to diverse fields such as 
biology and economics, where finite time series data of one or two variables are 
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used, even when the model system remains unknown. One of the most widely 
used methods to quantify the nature of such data, is the calculation of the 
correlation dimension D2 by the Grassberger-Proccacia (GP) algorithm [1,2]. 
Here the scalar time series is used to reconstruct the dynamics in an embed- 
ding space of dimension M using delay coordinates scanned at a prescribed 
time delay r. The results of the analysis arc often useful to indicate whether 
the nature of complexity in the data is chaotic, stochastic or a mixture of the 
two. 

Such studies have been carried out in various disciplines with beneficial out- 
comes in understanding complex systems [3-5]. As examples, we mention the 
calculation of the correlation dimension from the light curve of variable stars 
[6], quantifying and predicting the changes in the weather over a short period 
[7] , analysis of market variables to predict the financial market [8] , estimation 
of the dimension of the galactic structure in the visible universe [9] etc. The 
dimension values calculated from EEG data of the human brain has helped to 
analyze various states of the brain and its possible pathological changes [10]. 
The chaotic behavior of erythrocytes deformations identified through this tech- 
nique in healthy people and dislipidemic patients helps in their treatments. 

In all such applications, the analysis is hampered by the finite length of the 
available data [11] and more importantly by the presence of noise. The for- 
mer results in "edge effects", producing a downward bias in the correlation 
exponent estimates [12,13]. The presence of Gaussian white noise tends to fill 
the available phase space and hence increases the computed D2 value of the 
system. This effect has been studied by many authors [14-16] and it has been 
shown that reasonable estimation of D2{M) is possible only for moderate noise 
contamination. However, white noise, which is characterized by a flat power 
spectrum, is a particular case of the various kinds of noise that can exist in 
a physical system. More problematic is the presence of colored noise that is 
common in a wide variety of physical and biological systems, for example, in 
Brownian motion, astrophysical systems, neurons and in solid state devices. 
This is so because pure colored noise also produces a well saturated value of D2 
[17]. However, colored noise is basically a correlated stochastic process, which 
essentially generates a random fractal curve rather than a fractal attractor, 
with a power law dependence P{f) oc 1//". The power spectral indices a — 1 
and 2 correspond to "pink" and "red" noise respectively, while a ^ gives 
white noise. Its D2 value depends on a by the relation, D2 = 2/{a — 1) [17]. 
A detailed analytic study of colored noise has been undertaken by Theiler[18], 
where the scaling of the correlation integral for various values of length scales 
are derived. Also, the effect of colored noise contamination on the computa- 
tion of the correlation dimension has been studied recently by Redaelli et.al. 
[19], who show that the increase in the correlation dimension is less for colored 
noise as compared to white noise. The scaling region, which is used to com- 
pute the correlation dimension is less affected by the contamination of colored 
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noise. 



An important aspect of these results is that, since colored noise has a well 
defined saturated correlation dimension, further analysis is required to distin- 
guish low dimensional chaotic data (which may have colored noise contami- 
nation) from a pure noisy time scries. The standard technique used to make 
such distinctions is surrogate data analysis with a discriminating statistic [20- 
22]. Here, a number of surrogate data are generated which have practically 
the same power spectrum and/or the distribution of values as the real data. 
Then the real data and the surrogates are subjected (in principle) to the same 
analysis to identify if the real data is distinct from the surrogates. The null 
hypothesis is rejected if the discriminating statistic is different for the data 
and the surrogates. The most important measure used for the statistic is 
but the conventional method of calculating D2 imposes major difficulties in 
implementing this standard procedure. As discussed by many authors [23-25] , 
the standard correlation dimension analysis is subjective since the scaling re- 
gion used to compute D2 has to be identified in a subjective manner. For some 
data, especially those with inherent noise, different choices for the scaling re- 
gion can significantly change the estimate. This subjectivity does not allow 
one to be certain that exactly the same criteria have been used to analyze 
both the real and surrogate data, which is crucial to test the null hypothesis. 

Several techniques have been proposed in the literature to overcome these diffi- 
culties. Judd [24, 26] has introduced an alternate method for the calculation of 
D2 from the distribution of inter point distances in an attractor. This method 
avoids the problems associated with the scaling region, and among its many 
advantages is that it is particularly effective for attractors with multiple scal- 
ing. An extrapolation method has been proposed by Sprott and Rowlands [27] 
to obtain a functional fit for D2 with i? as a convergence parameter. While the 
above methods have their advantages, it is still useful to develop modifications 
of the standard GP algorithm to address these issues, primarily due to its wide 
spread applicability. Several attempts to improve the GP algorithm have been 
proposed and applied to some specific cases [28-34]. These include applying a 
maximum likelihood estimator to the slopes taken at discrete points [29] and 
smoothing the Cm{R) using a Gaussian kernel[30]. While such techniques have 
given robust results for particular systems, like the human a rhythm [32] , they 
still suffer from either the subjectivity of choosing a proper scaling region, or 
are computationally complex. 

This motivates us to propose and implement a modified GP algorithm which 
can be effectively used for the analysis of noisy time series, especially for the 

surrogate analysis of data involving colored noise. The main modification in 
this approach is to fix the scaling region algorithmically, so that for a given 
finite time series, D2{M) and the saturated correlation dimension can be 
computed in a "non-subjective manner" . This ensures that exactly the same 



3 



procedure is used on the real and the surrogate data. 

In the next section, the modified scheme is described while in §3, the scheme 
is tested using time series generated from a number of standard analytical 
low dimensional chaotic systems and pure colored noise. It is confirmed that 
the computed (M) and Df^* are consistent with theoretical values. The 
variation of D2{M) with the number of data points used in the analysis is 
studied, and the expected increase in D2{M) vahies with addition of white 
and colored noise is verified. In §4, surrogate data analysis is performed for 
one of these standard systems, with and without additional colored noise, to 
ascertain under what conditions the scheme can distinguish chaotic from noisy 
data. Quantifiers are introduced to measure the difference in D2 values that 
can serve as benchmarking indices. As examples of real world applications, 
data from two different physical systems are analyzed in detail in §5 and in 
§6, the main results of this work are summarized and discussed. 



2 Modified Algorithm 



The GP algorithm uses the delay embedding technique for the calculation of 
D2. It creates an artificial space of dimension M with delay vectors constructed 
by splitting a discretely sampled scalar time series s{ti) with 

Xi = [s{ti), s{U + r), s{U + iM- 1)t)] (1) 



Here the delay time r is chosen suitably such that the vectors are not cor- 
related. The relative number of points within a distance R from a particular 
(i*'') data point is given by 

p,{R)^Jirn^— Y: H{R-\x--xj\) (2) 



where iV^, is the total number of reconstructed vectors and H is the Heaviside 
step function. Averaging this quantity over randomly selected centers N^. gives 
the correlation function 

Cm{R)^j;^T.MR) (3) 
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The correlation dimension D2{M) is then defined to be the scahng index of 
the variation of Cm{R) with R as R ^ 0. That is, 

^ dlofMjR) 

R-o dlog{R) ^ ' 



For a finite data stream of length iV, only a finite number of vectors — N — 
(M — 1)t can be constructed and hence the correlation dimension computed 
will be a good approximation only if N is large. In practice, there are two 
other comphcations that hinder the accurate computation of D2{M). First, 
for small values of R, Pi{R) would be small and hence will be affected by noise 
due to counting statistics. Second, for large values of R, a significant fraction of 
the M-spheres used in the computation will typically go beyond the attractor 
region. This "edge effect" leads to under estimation of Cm{R) for large R and 
finally causes Cm{R) to saturate to unity. To avoid these effects in practice, 
a proper linear part in the logCM(-R) versus logR plot is identified which is 
called the "scaling region" and its slope is taken to be D2- However, such an 
exercise is subjective, being specific to data, especially for higher values of M. 

In our scheme, the original data set, Sj., is first transformed to a uniform devi- 
ate, Su{ti). Note that s„(tj) ranges from to 1, which makes the volume of the 
embedding space unity. In order to take into account the edge effects correctly, 
it is convenient to redefine p{R) as the number of data points within a M-cube 
(instead of M-sphere) of length R around a data point. This is equivalent to 
replacing the Euclidean norm by the maximum norm. Operationally this is 
done by choosing randomly data points as centers of M-cubes of length R. 
Of these A^^^ M-cubes, only those are considered which are within the bound- 
ing box of the embedded data. Finally the correlation sum Cm{R) is obtained 
by averaging the number of data points within the M-cubes. The imposition 
that a M-cube has to be within the embedding space ensures that there are 
no edge effects due to limited data points. However, this also means that for 
large values of R, only a small fraction of the original Nc M-cubes are taken 
into consideration. Hence a maximum value of R, Rmaxi is fixed such that for 
all R < Rraax the number of M-cubes which satisfy the above criterion, is at 
least one-hundredth of the total number of vectors, i. e. To avoid the 

region dominated by counting statistics only results from R > Rmin ^-re taken 
into consideration, where NyC{R) > 10, which ensures that on the average 
at least ten data points are considered per center. This makes sure that the 
region Rmin < R < Rmax is not affected by either "edge effects" or counting 
statistics. Although the criteria used to compute Rmin (i-e. N^C{R) pa 10) and 
Rmax (i-G. number of M-cubes ~ N^/100) may not be optimal for every kind 
of system, they do provide good estimates for all the systems we have studied 
in the next section. Moreover, from a surrogate analysis point of view (see §4), 
fixing these criteria a priori, ensures that the same conditions are imposed on 
the algorithm for estimating D2 of the data and the corresponding surrogates. 
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Cm{,R) is computed for several different values of R between Rmax and -Rmm, 
the logarithmic slope at each point is calculated and the average is taken to be 
D2{M). The error on D2{M) is estimated to be the mean standard deviation 
over this average. This error is an estimate of how well the region used by the 
scheme, Rmin < R < Rmax, cau be represented by a linear scaling region. A 
large error signifies that, those values of R for which Cm{R) is not affected by 
counting statistics and edge effects, do not represent a single scaling region. 
It should be noted that there often exists a critical embedding dimension M^r 
for which Rmin ~ Rmax ^ud uo significant results can then be obtained for 
M > Mcr- Thus our algorithm fixes an upper limit on M up to which calcula- 
tions are to be repeated. For practical implementation of the above scheme, it 
is sufficient to choose Nc as O.lNy. The delay time r is chosen to be the value 
where the auto-correlation function drops by 1/e. With these values, D2{M) 
for M = 1 to M = Mcr is computed for a given data stream and a chi-square 
fitting is undertaken using a simple analytical function 



The best fit value of (obtained by minimizing x^) is taken to be the 
saturated correlation dimension with errors corresponding to Ax^ = 1- Con- 
sidering the uncertainties in the computation and statistics of the errors in 
D2{M), a more sophisticated fitting procedure is perhaps not warranted. A 
best fit value of Df^^ ^ M^r implies that no saturation of D2{M) was detected. 

In summary, the algorithmic scheme first converts a data stream to a uni- 
form deviate. Next, the autocorrelation function is evaluated to estimate the 
time delay r. For each M, the Cm{R) are evaluated using Nc = 0.1 A^^, ran- 
domly chosen centers. The limits Rmin and Rmax are estimated and D2{M) 
is computed for the region from Rmin to Rmax ■ The process is repeated for 
consecutive values of M till Rmax ~ Rmin- The resultant D2{M) curve is fitted 
using function (5) which returns the saturated correlation dimension Z)|"* with 
an error estimate. A numerical code which implements the scheme is available 
at http:/ /www.iucaa.ernet.in~rmisra/NLDj 



3 Synthetic Data Analysis 

To illustrate the applicability of the scheme, it is used to analyse synthetic time 
series generated from six well known low dimensional chaotic systems. Figure 1 
shows the results of the analysis for a 30000 points long data stream generated 
from the Rossler system. The solid lines in the Cm{R) versus R curves mark 




for M < Md 



* for M >M, 



(5) 
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Table 1 



System Computed Z)|"*' 

Rossler attractor 

a = 6 = 0.2, c = 0.78 1.97 ±0.17 1.99 ± 0.08 

Lorenz attractor 

(j = 10,r = 28,6 = 8/3 2.03 ±0.16 2.05 ±0.1 

Ueda attractor 

= 0.05, ^ = 7.5 2.59 ±0.1 2.67 ±0.13 



Henon map 

= 1.4,6 = 0.3 1.23 ±0.1 1.22 ±0.04 
Lozi map 

= 1.7,6 = 0.5 1.41 ±0.1 1.38 ±0.05 

Cat map 2.00 ± 0.01 2.00 ± 0.06 

* values obtained by using the scheme prescribed in this work. For all data 
sets the number of points used are 30000 
t Dl"* taken from [27] 

the region between Rmin and Rmax that the scheme uses to compute D2{M). 
In the standard scheme these regions are usually selected subjectively by the 
analyst. As expected, the D2{M) curve is significantly different from that of 
white noise, where D2{M) = M. Hence curve is fitted using the function 
defined in (5). The best fit function is shown as a dashed line along with 
D2{M). The scheme returns = 1.97 ± 0.17 which may be compared with 
the value 1.99 ± 0.08 reported in the literature[27]. The exercise is repeated 
for five other standard systems and the resultant D2{M) curves arc shown in 
Figure 2. The 1^2"* values for all the six systems computed by our scheme are 
shown in Table 1, along with the values taken from [27] for comparison in the 
last column. The number of points used in all cases are 30000. 

For a randomly generated data set (i.e. white noise), the D2{M) curve goes 
as D2{M) = M which is also shown in Figure 2. However, for colored random 
noise characterized by a power- law spectrum, the D2{M) values saturate with 
M. The prescribed scheme should also reproduce effectively for such non- 
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Fig. 1. The results of the analysis for 30000 points from the Rossler system. Left 
Panel: The Cm{R) curves for M = 1,2,5 and 9 (from left to right). The solid 
lines indicate the scaling region which the scheme uses to compute D2{M). Right 
Panel: The D2{M) values (filled circles). The dashed line is the best fit curve, giving 
Df* = 1.97 ±0.17. 

chaotic systems. This is demonstrated in Figure 3, where the D2{M) curves 
are plotted for colored noise corresponding to three different values of a. In all 
three cases, the computed values agree with the theoretically expected values. 

The results of the analysis are expected to depend on the number of points in 
the data stream. As examples, we study the dependence of the computed sat- 
urated dimension on the number of data points available for the Rossler, 
Ueda and Henon systems. For the flows Rossler and Ueda, the results depend 
also on the total time T to which the governing differential equations have 
been evolved. If T is small, then the system would not have traced out its en- 
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Fig. 2. The D2{M) values (filled circles) for random data (white noise) and five 
standard low dimensional chaotic systems each having 30000 points. The dashed 
lines represent the best fit curves giving values tabulated in Table 1. 

tire attractor region and hence incorrect value of D2{M) would be computed 
even if the number of points available is large. In Figure 4(a), the variation 
of with N are plotted. Here the total time is held constant at T = 300 
and 1500 for the Rossler and Ueda systems respectively. The data is then 
sampled at different time intervals ST, to obtain different values of the num- 
ber of available points i.e. N = T/6T. Figure 4(b) shows the dependence of 
£)sat ^YiQ total time T, when the number of data points is held constant at 
= 10000. As can be seen in the figure, for the Rossler system, the technique 
computes reasonable values of Dl"* even when A^ ~ 1000, provided the total 
time T > 300. However, these values depend on the system and the type of 
attractor, since for the Ueda system, reasonable values are obtained only when 
A^ > 20000 and the total time T > 800. This is because of the difference in 
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Fig. 3. The D2{M) values for colored noise data, each having 30000 points, cor- 
responding to spectral index a = 1 (circles), 1.5 (squares) and 2 (triangles). The 
dashed lines are the best fit curves. For a = 1 the curve is consistent with white 
noise while for a = 1.5 and 2.0, * = 3.75 ±0.25 and 2.08 ±0.16 respectively. The 
theoretical value is Z?!'** = 2/(a — 1), which gives oo, 4 and 2 for a = 1.0, 1.5 and 
2.0 respectively 



the inherent Poincare times involved. For maps and colored noise, there is no 
inherent time-scale in the system and hence the only relevant parameter is the 
number of data points. The variation of for the Henon map is also shown 
in Figure 4(a). 

The addition of noise to the data from a chaotic system is expected to increase 
the value of D2{M). This is illustrated in Figure. 5 where 1^2'^* has been plotted 
for the Rossler system with different percentages of white and colored noise 
added. Since colored noise has intrinsically low values of D2{M), the increase 
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Fig. 4. (a) The saturated correlation dimension Z)|"* versus number of data points 
for Ueda (triangles), Rossler (circles) and Henon (squares) system. The total time 
evolved is T = N6T = 300 and 1500 for the Rossler and Ueda systems respectively, 
(b) The saturated correlation dimension Z?!"* versus the total time T for the Ueda 
(triangles) and Rossler (circles) system. The total number of points is fixed at 
N = 10000. The dashed lines indicate the standard D2 values. 



in Z^l"* is less for addition of colored noise as compared to white noise. The 
addition of even 50% red noise (a = 2.0) increases only up to ~ 3. 
This emphasizes the need for surrogate data analysis to differentiate between 
chaotic systems contaminated with noise, from purely stochastic ones. 
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Fig. 5. The saturated correlation dimension Z)| for the Rossler system with dif- 
ferent percentage of noise added. The four figures correspond to the addition of 
colored noise with different power spectrum index a. For white noise a = and red 
noise has a = 2. The number of points in the data stream is 10000. 

4 Surrogate Data Analysis 



Surrogate data analysis is perhaps the first important analysis that needs 
to be undertaken on a time series data to detect the presence of non-trivial 
structures. The basic idea is to formulate a null hypothesis that the data has 
been created by a stationary linear stochastic process, and then to attempt 
to reject this hypothesis by comparing results for the data with appropriate 
realizations of surrogate data. Ideally, surrogate data sets should have the 
same power spectrum and distribution of values as the real data. The method 
to generate surrogate data, namely Amplitude Adjusted Fourier Transform 
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Fig. 6. Correlation dimension D2 versus M, for two chaotic systems ( Rossler and 
Lorenz) and for colored noise with a = 1.5 and 2.0 ( filled circles connected by solid 
lines). The corresponding D2{M) for ten surrogates are shown by dashed lines. The 
error bars are omitted for clarity. The normalised mean sigma deviation, nmsd (see 
text) between the real and surrogate data for the chaotic systems Rossler and Lorenz 
are nmsd = 36.1 and 10.2 respectively, while for the colored noise nmsd = 0.68 and 
2.00 for a = 1.5 and 2.0 respectively. 

(AAFT), was originally proposed by Theiler and co-workers [20]. But recently 
Schreiber and Schmitz [21, 22] have proposed an iterative scheme, known as 
lAAFT, which is similar but reported to be more consistent in representing 
the null hypothesis [35]. In this work, we apply this scheme to generate ten 
surrogate data sets for each analysis, using the TISEAN package [36,37]. 

In Figure 6, the D2{M) values for data from two chaotic systems (Rossler 
and Lorenz) and two kinds of pure colored noise, are shown along with the 
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Fig. 7. The effect of addition of white {a = 0) and red noise (a = 2.0) on surrogate 
analysis of data from the Rossler system. The D2{M) values for the data are repre- 
sented by filled circles and connected by solid lines, while for the corresponding ten 
surrogates, the curves are represented by dashed lines. The upper panel (a) and (b) 
are for white noise contamination at 20% and 50% reprectively. The lower panel (c) 
and (d) are for red noise contamination at 20% and 50% reprectively. 

corresponding surrogates. In all analysis of this section, 10000 data points are 
used. As expected the results for the chaotic systems show clear deviation 
from their surrogates, while for pure colored noise, the results are similar to 
the surrogates and hence the null hypothesis cannot be rejected. The addition 
of noise to the chaotic system is found to decrease the difference between the 
D2{M) of the data and the surrogates. This is shown in Figure 7, where D2{M) 
for data and surrogates are compared for the Rossler system having different 
percentages of red colored and white noise added to the time series. Visual 
inspection of Figure 7 (upper panel), reveals that when white noise is added 
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Fig. 8. The normalised mean sigma deviation nmsd between data from the Rossler 
system and ten surrogates versus the percentage of noise added to the data. Circles 
: white noise (a = 0); Squares: colored noise (q = 1.5); Triangles: colored noise 
(a = 2.0). 

to the system, D2{M) for both the data and the surrogates increases. There 
is noticeable difference between the data and surrogates for a contamination 
level of 20%, but for a larger level of 50%, the data and the surrogates are no 
longer distinguishable. Red noise contamination is more interesting (Figure 
7, lower panel), since for pure red noise (i. e. a = 2.0) the saturated correla- 
tion dimension Dg"* = 2, which is roughly the same as that for the Rossler 
system. Thus the D2{M) value for surrogates decreases as the percentage of 
contamination increases, while the D2{M) values for the data increase. Nev- 
ertheless, even for a noise added level of 50%, the data and the surrogates are 
still distinguishable, in contrast to the case when white noise was added to 
the system. This result has also been verified for other values of a. 
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The above mentioned differences may appear highly quahtative and hence a 
quantification is attempted by defining a normahsed mean sigma deviation, 
nmsd. For this the average of Df^^^^M), denoted here as < D2^^^{M) >, is 
estimated using a number of realizations of the surrogate data. Then 



1 fD-2{M)- < L)f''''(M) > 



r^msd' = — ^ V ( '~ ' ' ) (6) 



where M„iax is the maximum embedding dimension for which the analysis is 
undertaken and agf^\M) is the standard deviation of D|"''^(M). The nor- 
malised mean sigma deviation, nmsd = 36.1 and 10.2 for the Rossler and 
Lorenz systems respectively, while nmsd = 0.6 and 2.0 for the two cases of 
colored noise (Figure 6). Thus a conservative upper limit of nmsd — 3 can be 
imposed, such that when nmsd is greater than 3, the data can be considered 
to be distinguishable from its surrogates. The variation of nmsd for different 
percentages of white and colored noise added to the Rossler system is shown 
in Figure 8. Note that for the same percentage contamination, nmsd is larger 
for colored noise than for white noise. 



5 Experimental Data Analysis 



Finally, we consider two data streams each obtained from different real world 
experiments, one set from an astrophysical X-ray object and the other, EEG 
data of the human brain. For both these cases, the continuous data stream 
has N < 5000 points and is expected to have noise contamination of unknown 
type and percentage. The X-ray data is taken from GRS 1915-1-105, which is 
a highly variable black hole system. Its temporal behavior has been classified 
into 12 different states and it shows signatures of low dimensional chaotic 
behavior in some of these states [38]. Here we choose representative data sets 
from two different spectral classes, namely, the j3 state and the 7 state, both 
consisting of 3200 data points. The data has been extracted with a resolution 
of one second to avoid the effect of Poisson noise. The X-ray light curves are 
shown in Figure 9, while Figure 10 shows the D2{M) curves for the data and 
the surrogates in both cases, obtained by applying our scheme. Here the nmsd 
is found to be 7.02 and 0.89 for f3 and 7 respectively, indicating that the null 
hypothesis can be rejected for the /? state. 

There are indications of nonlinear signature in the dynamical properties of 

the human brain's electrical activity, particularly during an epileptic seizure. 
Hence we choose two EEG data sets of the human brain, one during seizure 
period and the other for healthy state; both consist of 4098 data points. The 
data has been studied earlier and further details regarding its analysis can be 
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Fig. 9. The light curves from two temporal states, fi and 7 of the black hole system 
GRS1915+105, see text for details. 

found in Andrzejak et. al [39]. The EEG time series for both cases are shown 
in Figure 11, and the results of our analysis are shown in Figure 12. The curves 
are clearly different for the data and the surrogates for the seizure signal with 
nmsd = 20.1, while the healthy signal behaves as noise with nmsd = 0.64, 
consistent with the earlier analysis in [39]. 



6 Discussion 

In this paper we have implemented a modification of the conventional GP al- 
gorithm and calculated the correlation dimension in a non-subjective manner. 
The method is most suitable for surrogate data analysis where it is imperative 
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Fig. 10. The D2{M) values for the data (filled circles connected by line) and the 
corresponding values for surrogates (dashed lines) for the states shown in Figure 9 

that the same conditions are maintained in the algorithm for the data and the 
surrogates. Moreover, it can be applied to any arbitrary time series with a 
few thousand data points and provides an error estimate on the value of Df* 
obtained. 

The scheme is tested for standard low dimensional chaotic systems and for 
pure colored noise, and it is found that the computed D2{M) are close to the 
standard values in all cases. As expected, the addition of noise to the data 
from chaotic systems increases the correlation dimension D2{M). The scheme 
can differentiate between the results from noise contaminated data and corre- 
sponding surrogates, when the percentage of noise addition is low. The level of 
noise contamination up to which this differentiation can be made depends on 
the type of noise. The effect of contamination by red noise (which intrinsically 
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Fig. 11. The EEG time series for a seizure period and the healthy state 

has a low saturated correlation dimension) on surrogate data analysis is less 
compared to the addition of white noise, i. e., for the same percentage of noise 
addition, data with colored noise is more distinguishable from corresponding 
surrogates, than data with white noise. This implies that in those practical sit- 
uations where experimental data have colored (or an unknown type of) noise 
combined with the real signal, the present scheme can be used effectively. As 
examples of application, data sets from two scientific experiments are analyzed 
and the nature of their variability ascertained. 

It is also important at this stage to highlight a few possible limitations of 
this scheme while applying it to specific data sets. The scheme assumes that 
the actual scaling of C{R) with R is independent of R and hence the limit of 
i? — i> is not taken. This is particularly important for those systems where 
the scaling is significantly different for small and large R values [24]. The 
method of computing the time delay r in this scheme need not be optimal for 
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Fig. 12. The D2{M) values for the data(filled circles) and the corresponding surro- 
gates for the two EEG time series shown in Figure 11 



some specific data sets. Nevertheless, under certain real life conditions, like 
when the number of data sets are many and/or when a change in the state 
of a system needs to be evaluated quickly and/or when qualitative differences 
between time series needs to be estimated, the present scheme is recommended 
as a useful tool to compute the correlation dimension (and compare with 
surrogates) without non-algorithmic interventions. 
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